Morphology Driven Manipuri POS Tagger

نویسندگان

  • Thoudam Doren Singh
  • Sivaji Bandyopadhyay
چکیده

A good POS tagger is a critical component of a machine translation system and other related NLP applications where an appropriate POS tag will be assigned to individual words in a collection of texts. There is not enough POS tagged corpus available in Manipuri language ruling out machine learning approaches for a POS tagger in the language. A morphology driven Manipuri POS tagger that uses three dictionaries containing root words, prefixes and suffixes has been designed and implemented using the affix information irrespective of the context of the words. We have tested the current POS tagger on 3784 sentences containing 10917 unique words. The POS tagger demonstrated an accuracy of 69%. Among the incorrectly tagged 31% words, 23% were unknown words (includes 9% named entities) and 8% known words were wrongly tagged.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Will the Identification of Reduplicated Multiword Expression (RMWE) Improve the Performance of SVM Based Manipuri POS Tagging?

Reduplicated Multiword Expressions (RMWEs) are abundant in Manipuri, the highly agglutinative India language. The Part of Speech (POS) tagging of Manipuri using Support Vector Machine (SVM) has been developed and evaluated. The POS tagger has been updated with identified RMWEs as another feature. The performance of the SVM based POS tagger before and after adding RMWE as a feature have been com...

متن کامل

Reduplicated MWE (RMWE) helps in improving the CRF based Manipuri POS Tagger

This paper gives a detail overview about the modified features selection in CRF (Conditional Random Field) based Manipuri POS (Part of Speech) tagging. Selection of features is so important in CRF that the better are the features then the better are the outputs. This work is an attempt or an experiment to make the previous work more efficient. Multiple new features are tried to run the CRF and ...

متن کامل

Morphological Richness Offsets Resource Demand - Experiences in Constructing a POS Tagger for Hindi

In this paper we report our work on building a POS tagger for a morphologically rich languageHindi. The theme of the research is to vindicate the stand thatif morphology is strong and harnessable, then lack of training corpora is not debilitating. We establish a methodology of POS tagging which the resource disadvantaged (lacking annotated corpora) languages can make use of. The methodology mak...

متن کامل

Probabilistic Arabic Part of Speech Tagger with Unknown Words Handling

Part Of Speech (POS) tagger is an essential preprocessing step in many natural language applications. In this paper, we investigate the best configuration of trigram Hidden Markov Model (HMM) Arabic POS tagger when small tagged corpus is available. With small training data, unknown word POS guessing is the main problem. This problem becomes more serious in languages which have huge size of voca...

متن کامل

Part of Speech Tagger for Malay Language Based on Words Morphology

PART OF SPEECH TAGGER FOR MALAY LANGUAGE BASED ON WORDS MORPHOLOGY Mohd Pouzi Hamzah, Syarifah fatem Na’imah Binti Syed Kamaruddin School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu, Malaysia Email: [email protected], [email protected] ABSTRACT : Part of Speech (POS) tagging is an essential task in pre-processing for text process...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008